# Replication Data for: "Hug Fans or Follow Celebrities? How Nationalism Is Reinforced on Chinese Social Media"

**Version 1.1**  
Ji Yeon Hong; Yong H. Kim; Han Zhang; Tianzhu Qin, 2025,  
"Replication Data for: \"Hug Fans or Follow Celebrities? How Nationalism Is Reinforced on Chinese Social Media\"",  
https://doi.org/10.7910/DVN/OAD399, Harvard Dataverse, V1,  
UNF:6:PG2epV5ZZZEn8gHKaMetIA== [fileUNF]

---

## Files Provided

1. **`analysis code.ipynb`**  
   - Jupyter Notebook with the complete analysis workflow.  
   - Contains preprocessing, fastText classification, and statistical analysis code.  

2. **`SA_Weibo_data_public.csv`**  
   - Core dataset of Weibo celebrity microblogs.  
   - Columns:  
     - `post_date`: Timestamp of the original microblog post.  
     - `userID`: Unique identifier of the user who created the post.  
     - `microblogID`: Unique identifier of the microblog entry.  
     - `no_of_comments`: Number of comments on the post.  
     - `no_of_likes`: Number of likes on the post.  

3. **`df_stat_fasttext.csv`**  
   - Daily aggregated nationalism scores for celebrities and fans, based on fastText classification.  
   - Columns:  
     - `date`: Calendar date (from January 1, 2019 to December 31, 2019).  
     - `celebrity`: Daily Nationalism Score for celebrities.  
     - `fan`: Daily Nationalism Score for fans.  

4. **`df_stat_fasttext_all_rename.csv`**  
   - Extended daily nationalism scores with standardized variable names.  
   - Columns:  
     - `date`: Calendar date (from January 1, 2019 to December 31, 2019).  
     - `Celebrity(State-conformist)`: Daily Nationalism Score for state-conformist celebrities.  
     - `Fan(State-conformist)`: Daily Nationalism Score for state-conformist fans.  
     - `Celebrity(Non-conformist)`: Daily Nationalism Score for non-conformist celebrities.  
     - `Fan(Non-conformist)`: Daily Nationalism Score for non-conformist fans.  

5. **`training.data_partial.sampled.csv`**  
   - Balanced illustrative sample of 80 labeled comments (40 *Nationalist*, 40 *Not_Nationalist*).  
   - Columns:  
     - `commenter_user_id`: Unique identifier of the user who made the comment.  
     - `content`: Raw text of the comment.  
     - `label`: Category assigned (*Nationalist* / *Not_Nationalist*).  

---

## Notes on Data Sharing

- **Privacy**: Only `training.data_partial.sampled.csv` contains raw text. All other datasets omit or redact microblog post and comment text.  
- **Reproducibility**: All main results can be reproduced with the structured datasets and `analysis code.ipynb`.  
- **Transparency**: The sampled file demonstrates the labeling process, while the larger files allow replication of feature engineering and classification without exposing sensitive data.  
